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Abstract 

The competitive analysis fails to model locality of reference in the online paging problem. 
To deal with it, Borodin et al. introduced the access graph model, which attempts to capture 
the locality of reference. However, the access graph model has a number of troubling aspects. 
The access graph has to be known in advance to the paging algorithm and the memory required 
to represent the access graph itself may be very large. 

In this paper we present truly online strongly competitive paging algorithms in the access 
graph model that do not have any prior information on the access sequence. We present both 
deterministic and randomized algorithms. The algorithms need only O(fclogn) bits of memory, 
where k is the number of page slots available and n is the size of the virtual address space. I.e., 
asymptotically no more memory than needed to store the virtual address translation table. 

We also observe that our algorithms adapt themselves to temporal changes in the locality of 
reference. We model temporal changes in the locality of reference by extending the access graph 
model to the so called extended access graph model, in which many vertices of the graph can 
correspond to the same virtual page. We define a measure for the rate of change in the locality 
of reference in G denoted by A(G). We then show our algorithms remain strongly competitive 
as long as A(G) > (1 + e)k, and no truly online algorithm can be strongly competitive on a 
class of extended access graphs that includes all graphs G with A(G) > k — o(k). 



* Work done while the author was a Ph.D. student in Tel-Aviv University. Current affiliation: The Open University 
of Israel. 
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1 Introduction 



1.1 The Paging Problem and Competitive Analysis 

The paging problem is a simplification of an optimization problem that appears in computer systems 
with virtual memory. In the paging problem, memory is partitioned into two: Small and fast 
memory, called real memory vs. large and slow memory, called virtual memory. The memory space 
is divided into equal sized regions, called pages: k real memory pages, and n virtual memory pages. 
Usually n is much larger than k. 

Programs address the virtual memory, and the address translation mechanism translates it to 
a real memory address. Requests for virtual pages that are already in the real memory are called 
page hits. Whenever a requested virtual page is not in the real memory, a page fault occurs and 
the requested page is brought into the real memory. A page eviction strategy decides what page is 
to be evicted from the real memory in order to make room for the requested page. The goal of the 
strategy is to minimize the number of page faults, and the decisions should be made online, i.e., 
without knowing the future requests. Such a strategy is called a paging algorithm. 

If the paging algorithm has the entire request sequence in advance, i.e., it is not an online 
algorithm, a simple optimal solution due to Belady is as follows: Evict the page whose next use 
is furthest in the future. This strategy is called Opt. 

In a seminal paper, Sleator and Tarjan ^1] suggest using competitive analysis to measure the 
performance of online algorithms. Let A(k, a) denote the number of page faults a paging algorithm 
A incurs on the sequence a using a real memory with k page slots, and starting with no pages in 
real memory. If A is a randomized algorithm then A(k, a) is a random variable. 

Competitive analysis compares the cost of a given online algorithm to the optimal offline algo- 
rithm. In what follows, we describe the use of the competitive measure in the context of paging 
for randomized paging algorithms. We use the notion of the oblivious adversary j2] where the 
adversary knows the paging algorithm but not the random coin tosses of the paging algorithm. 

A randomized online algorithm On is called strictly r -competitive if E[0^(k, a)] < r ■ Opt(A:, a) 
for every request sequence a. The infimum of r for which On is r-competitive is called the strict 
competitive ratio of On and is denoted by ron{k). On is called asymptotically r-competitive, if there 
exists a constant C > 0, such that on any request sequence a, E[0^(k, a)] < r ■ Opt(/c, a) + C. The 
infimum of r for which On is asymptotically r-competitive is called the asymptotic competitive 
ratio of On and is denoted by rg^fe). Obviously, ?~on(&) < ^on(^)- 

As shown in Jl], the best deterministic strict competitive ratio and the best deterministic 
asymptotic competitive ratio for paging with k page slots are both equal to k. Fiat et al. [7] proved 
that the asymptotic competitive ratio for randomized paging algorithms is J7(ln k) and the strict 
competitive ratio for randomized paging algorithms is 0(ln/c). 

1.2 Locality of Reference 

Competitive analysis of paging algorithms does not model reality well. It fails to distinguish between 
algorithms that perform very differently in practice. For example, both "Least Recently Used" 
(Lru) algorithm and "First In First Out" (Fifo) algorithm have optimal deterministic competitive 
ratio of k, but in practice Lru out-performs Fifo. Furthermore, the "observed competitive ratio" 
of Lru is usually only a constant, i.e., on typical request sequences its performance is worse than 
Opt by a constant (« 4) multiplicative factor [2]. 
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A partial explanation for these phenomena is that programs exhibit locality of reference. Infor- 
mally, locality of reference means that pages requested in the near past are likely to be requested in 
the near future, and at any moment there is usually a small set of pages likely to be requested. The 
standard competitive analysis does not consider locality of reference, as it treats all possible request 
sequences the same. Thus, the competitive ratio is likely to be unrealistically high for algorithms 
that better exploit the locality of reference in the request sequence. 

Motivated by this observation, Borodin, Irani, Raghavan and Schieber [3J suggest incorporating 
locality of reference into the competitive analysis. In their model, the set of possible request 
sequences is limited to only those derived from walks on a fixed access graph. 

An access graph G = (V, E) for a program is a graph that has a vertex for each page in the 
virtual memory. Locality of reference is imposed by the adjacency relationships in the graph: A 
page v can be requested immediately after a page u only if there is an edge between u and v in the 
access graph. Hence, the possible request sequences are limited to those correspond to paths in the 
access graph. Here we consider only undirected access graphs. 

The competitive ratio of a paging algorithm is now dependent on the access graph G. Let 
paths(G) denote the set of finite length paths in G. Then 

r x(G,k) = 

inf{r : Vcr e paths(G), E[Ovs{k,a)] < r • Opt(£;,ct)}, 

inf{r : 3C > Vcr e paths(G), E[On(k, <r)] < r • Opt(/c, a) + C}. 
We define the following terminology and notation: 

• The deterministic competitive ratio of a paging problem, 

r(G,k) =mfr A (G,k), 
A 

where A ranges over the deterministic online paging algorithms. 

• The deterministic asymptotic competitive ratio of a paging problem, 

r°°(G,fc) = inf r%(G,k), 

where A ranges over the deterministic online paging algorithms. 

• The randomized competitive ratio of a paging problem, 

r b\(G, k) = irtfr A (G,k), 

where A ranges over the randomized online paging algorithms. The subscript obi indicates 
the usage of the oblivious adversary model. 

• The randomized asymptotic competitive ratio of a paging problem, 

r M {G,k)=wfr%(G,k), 

A 

where A ranges over the randomized online paging algorithms. 
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We are interested in uniform online algorithms that are given the access graph G as their 
input (before receiving the request sequence) and work in poly(|G|,i) time for the ith request. We 
adapt the convention from ^U] and define a uniform online paging algorithm A to be very strongly 
competitive if its competitive ratio for any paging problem is bounded from above by a fixed linear 
function of the asymptotic competitive ratio of the paging problem. I.e., there exist b±, 62 > such 
that for every k and G, 



See Section fl. 51 for discussion on the choice of this definition. 
1.3 Truly Online Algorithms 

A problematic aspect of previous algorithms for the access graph model, such as those in [H ll0[ l6j. 
is the assumption that the access graph is given in advance. This assumption has the following 
obvious drawbacks: 

• It is not clear how the paging algorithm gets hold of the access graph. One possible solution 
suggested is that information be gathered on the program access graph during the compile 
phase, but this argument is only partially satisfactory. 

• The storage requirements just to represent the access graph are at least as large as a constant 
fraction of the virtual memory size and may even be a constant fraction of the virtual memory 
size squared! 

In contrast, algorithms such as Lru, Fifo |14j and Rmark [Jj, that are "oblivious" to the 
underline access graph, do not have those problems. We call such algorithms truly online algorithms. 

Definition 1.1. A uniform online paging algorithm On is called truly online if it does not get 
the underlying access graph as an input, and only gets the page request sequence (in an online 
fashion). More formally, Let A be a uniform paging algorithm. Denote by A(G,k) the startgey of 
this algorithm tailored for access graph G and cache of size k. A is called truly online if for any two 
access graphs G\ and G<i on the same vertex set, any k G N, and any request sequence a compatible 
with both G\ and G2, A(G±, k), and A(G2, k) produce the same distribution when applied to a. 

Classic paging algorithms such as LRU and FIFO are truly online but not strongly competitive, 
as demonstrated in |3] . The existence of truly online very strongly competitive algorithms is not at 
all obvious. Nonetheless, in this paper we present two paging algorithms, a deterministic algorithm 
and a randomized algorithm, with the following desirable properties: 

1. Both algorithms are truly online and very strongly competitive. This implies that knowing 
the access graph is not necessary for "almost optimal" online algorithms. 

2. Storage requirements are only 0(k log n) bits, compared to a naive implementation that stores 
the whole access graph and needs fi(n 2 logn) bits in the worst case. Using randomization, 
we can reduce the space requirement even further to 0(A;logA;) bits. 




bir°°(G, k) + 62 if A is deterministic 
&ir?L?|(Cr, k) + 62 if ^4 is randomized. 



(1) 
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3. Both algorithms can be implemented fairly efficiently to deal with page hits. In fact, their 
hardware requirements for implementing page hits are comparable to the complexity of im- 
plementing Lru in hardware. 1 

4. Both algorithms are adaptive. If the page sequence exhibits different behavior over time, 
the algorithms adapt to these changes. Unfortunately, locality of reference as captured in 
the access graph model is fixed, and therefore the access graph model does not explain all 
properties of our algorithms. In the next section (Section ll.4|) we consider a model for 
"changing locality of reference" , which reveals the full strength of our algorithms. 

1.4 Refined Locality of Reference 

We seek a model that allows one to deal with changing behavioral patterns of the access sequence 
over time. For example, a compiler may run in many stages, with entirely different local behavior 
in the different stages. Because much of the execution of software is performed in the operating 
system (I/O processing), a common access graph would show that certain pages are accessed from 
all over the address space, essentially losing much of the information about locality of reference. 

To deal with this we allow multiple appearances of virtual page labels in the access graph. The 
same page label may appear on many different vertices. The access sequence is constrained to obey 
the locality conditions imposed by the edge relations in the graph. I.e., every access sequence is 
derived from a path in the graph. We call this model the extended access graph model. 

For a given extended access graph G we define the parameter A(G) to be the shortest path in 
G between two different vertices labeled with the same page. Observe that A(G) is the minimum 
number of requests to different pages needed to separate requests for the same page that have a 
different set of neighbors. Intuitively, A(Gr) indicates "how quickly" locality of reference changes. 

As we shall see, our algorithms perform quite well with respect to A(G). Specifically, we show 
that our algorithms are very strongly competitive with respect to the family of all extended access 
graphs with A(G) > (1 + e)k. We also prove an almost matching impossibility result: there exists 
a family of extended access graphs with A(G) > k — o(k), such that no truly online algorithm can 
be very strongly competitive on this family. 

1.5 Very Strong competitiveness vs. Strong competitiveness 

The definition of very strong competitiveness (Eq. uses the strict competitive ratio on the left 
hand side, yet makes use of the asymptotic competitive ratio on the right hand side. 

A more commonly used measure in previous literature jlUl H3 is the following weaker notion 
of strong competitiveness. A truly online algorithm A is called strongly competitive if there exist 
bi,02 > such that for every k and G, 



In this section we clarify our choice. First we note that the notion of very strong competitiveness 
implies strong competitiveness. We also note that the proofs of strong competitiveness in |1U1 |S] 
actually imply very strong competitiveness. 

Processing of page faults is more complicated than the processing required by Lru. Arguably, this is less 
important, since page faults are relatively infrequent, and are accompanied with a large I/O overhead anyway, so 
implementing the page fault logic in software is relatively insignificant. 




6ir°°(G, k) + b2 A is deterministic 
&ir^(G, k) + 1)2 A is randomized. 
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We give upper bounds on the strict competitive ratio and lower bounds on the asymptotic 
competitive ratio. If one gets an upper bound on the strict competitive ratio — one also has an 
upper bound on the asymptotic competitive ratio. Likewise, a lower bound on the asymptotic 
competitive ratio implies a lower bound on the strict competitive ratio. Thus, our results are the 
strongest possible amongst the various variants. 

When considering uniform (non truly online) algorithms, one should be careful when using 
strong competitiveness. In this case, a uniform algorithm could have computed an optimally 
asymptotically competitive online strategy (see 4 ) by amortizing a long computation in terms 
of \G\ over a long prefix of the request sequence, and using a large constant additive term C to 
cover the cost incurred while processing the prefix of the request sequence. 

Here we avoid this problem by using the strict competitive ratio. Any algorithm with a "good" 
strict competitive ratio avoids the potential pitfall of simply waiting sufficiently long so as to learn 
the page request distribution. 

We next argue that truly online strongly competitive algorithms easily follow from existing 
uniform algorithms appearing in |HllO|l6]. Let A be one of the uniform algorithms from [Hll0[l6]. 
Execute A on "the observed access graph" so far, i.e. the graph that contains all edges that have 
been used thus far in the prefix of the request sequence. The resulting algorithm is clearly truly 
online. 

To see that the resulting algorithm is also strongly competitive, observe that the algorithms of 
[H 1101 IB] have the marking property, and furthermore, the proof that they are 0(r) competitive 
uses the argument that on any phase with g new pages, they fault at most 0{rg) times. Hence, 
when analyzing their truly online counterparts, we observe that in phases in which no new edges 
of the access graph are revealed, these algorithms fault at most 

g x (the competitive ratio of the access graph observed thus far). 

In phases during which new edges of the access graph are revealed, these algorithms fault at most k 
times (as any marking algorithm). As there are at most (2) phases in which new edges of the access 
graph can be revealed, we can use C = 0(n 2 k) as the constant additive term in the definition of 
asymptotic competitive ratio — thus showing that these algorithms are strongly competitive. 

This type of solution has the following drawbacks: (i) The additive term can be huge, as n 
is typically much larger than k. (ii) It requires Q(n 2 ) memory, (iii) It does not extend to the 
extended access graph model. Therefore, in the reminder of this paper we will only consider very 
strong competitiveness. 

1.6 Related Work 

Borodin et al. jlj introduce the access graph model. They present some basic facts about it and 
investigate popular algorithms like LRU and FIFO in this context. In particular, they prove that the 
competitive ratio of Lru is at most twice the competitive ratio of FIFO for the same access graph. 
They also show that Lru performs badly on access graphs with cycles of size k + 1. Later Chrobak 
and Noga [3] proved that Lru is better than Fifo in this model, i.e., r^ v (G,k) < r^ 1FQ (G,k) for 
any access graph G and k. 

Borodin et al. [1] also consider deterministic uniform paging algorithms. They prove the ex- 
istence of an optimal paging algorithm in PSPACE(jGj). They give a natural uniform paging 
algorithm, called Far, and prove that Far obtains a competitive ratio no worse than 0(logA;) 
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times the asymptotic competitive ratio for the graph. This result is improved in a paper by Irani, 
Karlin and Phillips ^0] m which it is shown that Far is very strongly competitive. The same paper 
also presents a very strongly competitive algorithm for a sub-class of directed access graphs, called 
tree connected directed cycles. 

Fiat and Karlin [5] present a strongly competitive randomized algorithm, and a strongly compet- 
itive algorithm for multi pointer paging (where the page requests come from more than one source). 
The latter gives an alternative deterministic strongly competitive algorithm. The algorithms of 
deterministic and randomized, are the basis of this paper. 

Karlin, Phillips and Raghavan jllj consider a paging problem where the input to the paging 
algorithm is a Markov chain with states correspond to pages, and probabilities (pij)ij such that 
Pij is the probability page j is referenced just after page i. They show a paging algorithm that is 
within a constant multiplicative factor of the optimal online algorithm when the request sequences 
are generated from the Markov chain. A simpler and better algorithm for Markov paging and 
generalizations was given by Lund, Phillips and Reingold |13j . 

Fiat and Rosen |2j present an access graph based heuristic that is truly online and makes use 
of a (weighted) dynamic access graph. In this sense we emulate their concept. While the Fiat 
and Rosen algorithm is experimentally interesting in that it seems to beat Lru, it is certainly not 
strongly competitive, and is known to have a competitive ratio of &(klogk). 

Much of the above work is summarized in chap. 3-5]. 

2 Preliminaries 

A crucial concept in this paper is the phase-partitioning of the request sequence. 

Definition 2.1 (Phase partitioning [7]). The request sequence is partitioned into disjoint con- 
tiguous subsequences, called phases, as follows. The first phase begins at the beginning of the 
sequence. The ith phase begins immediately after the i — 1th phase ends, and it ends either at 
the end of the sequence, or just before the request for k + l'th distinct page during the ith phase 
(whatever comes first). Note that phase partitioning can be done in an online fashion. 

A new page for the ith phase is a page which has been requested in the ith. phase, and either 
i = 1 or the page was not requested in the (i — l)-phase . The following lemma clarifies the 
importance of the phase partitioning of request sequences. 

Lemma 2.2. Given a request sequences a composed of i phases, where the ith phase has gi new 
pages. Then, gi > 1 and 

^ I i 
~^9i< Opt((t,A;) < Y^9i- 

i=l i=l 

In order to prove that an online algorithm is strictly r— competitive, it is therefore sufficient to 
show that in a phase with g new pages, the online algorithm faults at most times. 

A page that has been already requested during the current phase is called marked. Marks are 
erased at the end of the phase. An online algorithm is said to have the marking property if it 
never evicts a marked page. The only difference between different marking algorithms is the page 
eviction strategy used for unmarked pages. Note that marking algorithms have at most k faults in 
a phase. All the algorithms we consider in this paper are marking algorithms. 
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For marking algorithms, we use the term hole to denote a page that was requested during 
the previous phase, evicted during the current phase and has not been requested yet during the 
current phase. A page is called stale if it was requested in the previous phase, and has not yet been 
requested or evicted in the current phase. 

Borodin et al. [I] present the following lower bounds on the asymptotic competitive ratio. Let 
£(T) denote the number of leaves in a tree T, and let %{G) denote the set of i- vertex sub-trees of 
the graph G. 

Lemma 2.3 ([4]). For any access graph G and k page slots, 

r°°(G, k) > max{£(T) - 1 j T G T k+1 (G)}, 
r^{G, k) > max{^ (7Vl | T G T k+1 (G)}, 

where H n = Y17=i i ~ 1 - 

We obtain an estimate of the number of leaves that can be found in a sub-tree of a given graph 
G by the following proposition, (see and references therein). 

Proposition 2.4. Let G = (V,E) be a connected graph with k < \V\ < 2k and I vertices with 
degrees other than two. Then there exists a sub-tree of G on k + 1 vertices with at least £/30 leaves. 

Borodin et al. [1] present another lower bound using the notion of vine decomposition. 

Definition 2.5. 4 A vine decomposition V = (B,V) of a graph G is a connected sub-graph B 
together with a set of paths V = {p\,P2, • • •} in G such that (i) the endpoints of paths in V are 
adjacent to vertices in B; (ii) The set of vertices appearing in paths in V is disjoint to B. (hi) 
The paths in V are pairwise disjoint in terms of vertices. B is called the backbone of V. For a 
path (vine) p denote by \p\ the number of vertices in p plus one, i.e., the number of edges in p 
including those connecting them to B. Define the value of vine decomposition V = (B,V) to be 

Kv) = Epilog \p\. 

Lemma 2.6. ^ Denote by TCi(G) the set of vine-decompositions of i-vertex subgraphs of G. Then, 
r°°{G, k) > max{i/(V) | V € H k+1 (G)}. 

The following lower bound on the asymptotic competitive ratio is useful when the access graph 
contains a "large" cycle. 

Lemma 2.7. [10] If{B,V) G Ti k+g {G) and g>l, then 

r°°(G,k)> |maxlog(H -1)- log #1/2. 
p&V 

An analogous lower bound for randomized algorithms, due to Fiat and Karlin 6 : 
Lemma 2.8. ^ For any (B,V) G 7i k+g {G) with at least 2g vertices in V, where g > 1, 

r^(G,fc) = fi(log(^H)-log 5 ). 

pev 

The following proposition is immediate. 
Proposition 2.9. If G is a sub-graph of G' then r^(G, k) < r^(G' , k) and r°°(G, k) < r°°(G', k). 
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3 Randomized Algorithms 



Our algorithms are similar to Fiat and Karlin's algorithms but they do not have the access 
graph available in advance. Instead, they make use of a spanning tree of the graph resulting from 
the request sequence of the previous phase. 

Let P denote the pages requested in the previous phase. Let Gp = (P, E), where E = {uv\u, v € 
P requested successively in the previous phase, and Let Go = (Vq=P, Eq) be a spanning 

tree of Gp. Let ro denote the last page requested in the previous phase and let rj, i > 1, denote 
the ith page request in the current phase. Define Gj+i = (V^i, where V{+i = V{ U {rj + i}, 

and E i+ i = ^U {nri+i} \ {r^}. 

Similar to Fiat and Karlin's algorithms, our algorithms are marking algorithms having three 
sub-phases in a phase, with a different page eviction strategy in each one of them. Let G/j denote 
Gj at the end of sub-phase II, and Gm denote Gj at the end of the phase. 

In this section we present and analyze Rto, a truly online randomized paging algorithm. In the 
first two sub-phases of Rto, a vine-decomposition is constructed in Gn such that the backbone of 
the vine-decomposition consists of the marked vertices and the evicted vertices. The third sub-phase 
evicts vertices randomly from the paths of the vine-decomposition above. 

Algorithm Rto(/c). Rto is a marking algorithm that partitions the phase into three consecutive 
sub-phases. In each sub-phase it does as below. We emphasize that the graph referred to in the 
following discussion is Go, i.e., a spanning tree of Gp. 

Subphase I: Denote by G the set of vertices of degree not equal to two in Go- On a fault, evict 
a random unmarked unevicted (stale) page v £ G. If there is no such page, and the phase is 
not over, proceed to sub-phase II. 

Subphase II: At the beginning of the sub-phase, all stale pages lie on degree-2 vertices in Go- 
Denote by C C G, the set of holes at the beginning of the subphase II. For each w 6 C", we 
maintain a dynamic set A v . At the beginning of the subphase, A v = {v}. A vertex v € C 
is called "alive" as long as A v contains only holes, and there exists a stale vertex adjacent to 
A v in G . 

On a fault choose v such that 

1. v € C and "alive". 

2. v minimizes \A U \ amongst all candidate vertices u meeting condition 1 above. 

If no v meets the criteria above, proceed to sub-phase III. Otherwise, evict a stale vertex w 
adjacent to A v , and set A v <— A v U {w}. 

Subphase III: On a fault, evict a random unmarked page. 
Competitive Analysis 

The analysis of Rto follows the analysis of the randomized algorithm from 

Assume that g new pages are requested during the phase, and let fi denote the expected number 
of pages evicted during sub-phase i that will be requested (later) during the phase. We note that 
the expected total number of faults in the phase is at most f± + fi + + g. We will show that 
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fi = 0(g ■ r^(Gji,k)). As Gu is a sub-graph of the underlying access graph G, it follows that 
Rto is very strongly competitive on G. 

Sub-phase I faults: Let S C C be the set of vertices evicted in sub-phase I. Suppose the 
adversary requested the vertices in S, during the phase in the order s±, S2, ■ ■ ■ , s m . 

Proposition 3.1. The probability that Rto has a hole at Sj at the time it is requested, is at most 
g/(\C\-i + l). 

Proof. Rto has, at any point in time, at most g holes. It is easy to prove by induction on the 
number of requests since the beginning of the phase, that after was requested and before Sj 
will be requested, the holes are evenly distributed among C \ {si, . . . , Si_i}. For requests during 
sub-phase I, this follows from the page eviction strategy of sub-phase I. For requests after sub-phase 
I, C \ {si, . . . , Si} are all holes, and |C \ {si, . . . , Si}\ < g — 1. In either case, the probability for a 
hole in Sj is at most g/(\C\ — i + 1). □ 

The expected number of evictions on C is therefore no more than gH\c\- G\ is a tree on k + 1 
vertices with 0(|C|) leaves and by Lemma l2~31 r^(Gi, k) = fi(.Hjq). Thus, h = 0(g ■ r%fo(G 1} k)). 

Sub-phase II faults: Note that \C'\ < g — 1. Let a v denote \A V \ at the time immediately 
before v G C dies. Note that /2 = ^2 V £C'( a v ~ ■*-)• Note also that throughout sub-phase II: (i) 
\ A V \ < g where the sum ranges over the "live" vertices in C, and (ii) \\A V \ — \A U \\ < 1 for 
any two "live" vertices u,v € C. Denote by Vi the ith vertex that dies in C. We conclude that 
a v% < \g/(\C\ - i + 1)1 . So f 2 < 9 + qH\c\ = 0(/i) = 0(g ■ r^{G u k)). 

Sub-phase III faults: Denote by LT the set of stale pages at the beginning of the sub-phase. The 
vertices in II have degree 2 in Gu, since any u s.t. deg G (u) ^ deg Go (it) must have been marked 
by now. Denote by B the sub-graph of Gu induced on (P U P') \ LT, where P is the set of vertices 
requested in the previous phase, and P' is the set of vertices requested in sub-phases I and II. 

Proposition 3.2. B is connected. 

Proof. The vertices in P' are on some path in Gu, and since LT contains no vertex from P', we 
deduce that P' is contained in some connected component of B. 

Assume, for the sake of contradiction, that B has more than one connected component. So, 
there must be another connected component X. Observe that X must intersect C, since P\LT C C. 
For v € X n C let A' v be A v at the time v dies. Note that X = U vE xnC-A' v . As X does not contain 
marked pages, the only way the vertices in X n C died is by not having any stale page (page in LT) 
adjacent to X. A contradiction. □ 

Denote by II the set of paths induced by LT in Gu. The endpoints of the paths in II are adjacent 
(in Gu) to B. Thus, from Proposition I3.2L (B, II) is a vine-decomposition of a subgraph of Gu 
with at most k + g vertices. Denotes by L the number of vertices in II. Denote by g' the number 
of holes on LT at the end of the phase. Clearly, g' < g, and at most L — g' vertices from II can be 
requested during this sub-phase. As in sub-phase I, the probability that the ith requested vertex 
in II (1 < i < L — g') is a hole, is at most g'/(L — i + 1). Thus, 

h < g'(H L - H g/ ) < g\hxL- hxg' + 1) < g(lnL - Ing + 1). 

By Lemma I2~K1 r|g, (G?jj,fc) = ft (log L - log g). Therefore f 3 = 0(g ■ r^(G n ,k)). We conclude, 
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Theorem 1. Rto is very strongly competitive on any underlying access graph. 

4 Deterministic Algorithms 

Next, we present Dto, a deterministic truly online algorithm paging algorithm. Dto is similar to 
the deterministic algorithm from p], but instead of using a known access graph G, it makes use of 
the dynamic tree Gq. As in Rto, the first two phases construct a vine-decomposition in Gn such 
that requested and evicted vertices form the backbone of the vine-decomposition. Here, however, 
in Subphase III Dto attempts to evict pages laying in the middle of paths of unmarked vertices. 

As in Rto, all graph relations described herein relate to Go- A path p in the graph satisfying 
certain property P is called maximal (with respect to containment) if there is no path q the properly 
contains p and also satisfies P. A midpoint of a path is a vertex or an edge that is exactly in the 
middle of the path, i.e., at equal distance from both its endpoints. 

Subphase I: Denote by C the set of vertices of degree not equal two in Gq. On a fault, evict an 
unmarked unevicted (stale) page ueC. If there is no such page, and the phase is not over, 
proceed to sub-phase II. 

Subphase II: At the beginning of the sub-phase, all stale pages lie on degree-2 vertices in Go. 
Denote by C' C C, the set of holes at the beginning of the subphase II. For each v £ C", we 
maintain a dynamic set A v . At the beginning of the subphase, A v = {v}. A vertex v € C' 
is called "alive" as long as A v contains only holes, and there exists a stale vertex adjacent to 
A v . 

On a fault choose a live v € C' . If no v meets this criteria, proceed to the next sub-phase. 
Otherwise, evict a vertex vertex w adjacent to A v , and set A v <— A v U {w}. 

Subphase III: On a fault, choose a maximal (w.r.t. containment) path p of unmarked vertices 
that contains a stale page, and evict a stale page in p which is closest to the midpoint of p. 

Competitive Analysis 

Let g be the number of new pages requested during the entire phase, and let denotes the number 
of pages evicted during sub-phase i. As in the case of Rto, the total number of faults in the phase 
is at most /i + /2 + /3 + g- We will show that fi = 0(g ■ r°°(Giii, k)). 

Sub-phases I &: II. Let C C P be the set of vertices in P with degree 7^ 2 in Gq. As in the 
analysis of Rto, denote by a v the size of A v immediately before v € C "dies" . Since /1 + /2 = 
|C| + J2veC'( a v ~ 1) an d cl v < g, we conclude that f\ + /2 < g ■ \C\. From Proposition 12.41 G\ is 
a tree on k + 1 vertices with O ( | C [ ) leaves and by Lemma |2~31 r°°{G\,k) = f2(|C|), and therefore 
h + f2 = 0{g-r°°{G 1 ,k)). 

Sub-phase III faults: Denote by II the set of stale pages at the beginning of the sub-phase. The 
vertices in II have degree 2 in Gu, since any u s.t. deg G// (u) 7^ deg Go (u) must have been marked 
by now. Denote by B the sub-graph of Gu induced on (P U P') \ II, where P are the vertices 
requested in the previous phase, and P' are the vertices requested in sub-phase I and II. 
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Figure 1: Possible scenarios during sub-phase III. 



Proposition 4.1. B is connected. 

Proof. Similar to the proof of Proposition 13.21 □ 

Denote by n the set of paths induced by II. The endpoints of the paths in II are adjacent to 
B, hence (B,H) is a vine decomposition of Gij. Unlike the case in where the algorithm faults 
at most glog \p\ times on every path p £ II, Dto might fault on every vertex of every path p 6 II. 
Nonetheless, in Sectional we prove: 

Lemma 4.2. / 3 = 0{g ■ r™(G In , k)). 

We conclude: 

Theorem 2. Dto is very strongly competitive on any underlying access graph. 

5 Proof of Lemma 14.21 

Our proof of Lemma l4.2l is quite lengthy. To make the exposition simpler it is partitioned as follows: 
Section [5. II presents the complications in proving the lemma and gives some intuition. Section [5.21 
introduces the notation used throughout the proof. Section 15.31 provides the proof, leaving out 
some combinatorial lemmas. Section [5.41 ends the exposition by providing the missing proofs. 

5.1 Informal Exposition 

First we should note that the situation here is quite different from the randomized case. In the 
randomized case the upper bound on the number of faults is not influenced by the new edges 
revealed in sub-phase III. In contrast, in the deterministic case, the added edges can increase the 
number of faults. For example, in case 1 in Fig. ^ at the end of sub-phase II we have a path 
(u,...,v) in the vine decomposition II in Gij. Hence, the naive lower bound for the number of 
faults in this vine is g(logY2i n i)i whereas Dto might have there almost g(^jlognj) faults, which 
can be much higher. In this example, the solution is clear — we should construct a new vine 
decomposition that uses the new edges as part of the backbone and has a value of lograj). 
I.e., we improve the lower bound on the number of faults of Opt to match the upper bound. 
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Case 2 in Fig. ^ is more complex. Here the construction of a new vine decomposition is not 
obvious. The scenario addressed here includes cases where new edges connect one Gu path to 
another. These new edges split the paths into sub-paths. If the lengths of the resulting sub-paths 
were (rtj)j, then one upper bound on the number of faults for Dto during sub-phase III would be 
g - logrtj. It is not obviously clear that the adversary can actually force such a number of faults. 
However, we will prove that in this scenario it is possible to build a new vine decomposition with 
a value of c Y2« l°g n % f° r some global constant c > 0. Again, we have found matching upper and 
lower bounds. 

The situation becomes more complicated when the new edges do not cross path boundaries, 
as in case 3 in Fig. In this case we can not hope to construct a vine decomposition with value 
fiQ^i log Hi), such a vine decomposition simply does not exist. Here we will have to show that the 
upper bound on the number of faults for Dto is indeed 0(log(^ i ni) + Yli l°g n 2i), which is smaller 
than ^jlognj. 

The difference between cases 1,2 in Fig. ^ an d case 3 is that in cases 1 and 2 we used a simple 
upper bound on the number of faults and could devise an appropriate vine decomposition for the 
lower bound on the competitive ratio. In case 3 we need a more sophisticated upper bound as well 
as more involved construction of the vine decomposition for the lower bound. 

5.2 Preliminaries 

During sub-phase III, new pages might be requested (at most g — 1 new pages). As we can associate 
an amortized cost of fi(l) to any offline algorithm for every new page, we would like to "ignore" 
them, but we need to consider the connectivity relations they induce. 

Definition 5.1. The simplification of Gjjj is a graph denoted by G s = (V,E), such that V is 
the set of vertices in Gu and E includes the edges of Gu and edges uv if there exists a path in 
Gui between u, v £ V such that all its internal vertices are not in V, i.e., they are new vertices 
requested during sub-phase III. 

It will be more convenient for us to work with G s , as the set of vertices in which we are interested 
(stale pages at the end of phase II) are already in G s , and G s has the same set of vertices as Gu, 
and just more edges. However, in the conclusion of the proof, we will have to reconsider the fact 
that the actual graph, Gui, might have another g — 1 vertices. 

As mentioned in Section I5.1( the vine decomposition of Gu may not give us a sufficiently high 
lower bound. In order to differentiate it from the final vine-decomposition, we call it the backbone 
bi-connected path complex in G s , or simply the complex. 

Given a graph G we denote its set of vertices by V[G]. For U QV[G] we denote the sub-graph 
induced by G on U as G\jj. Given a simple path p = (v\, . . . ,Vk) we define the inner subpath 
Hp) = (v2,v 3 , . . .,«jfc_i). 

Definition 5.2. A proper path p is a path in G s such that edges with one endpoint in V[J(p)] have 
their other endpoint in V[p\. 

Note that the new edges added to Gu during the course of sub-phase III decompose the paths 
of the complex into disjoint sub-paths. We view this decomposition as an hierarchical process as 
follows: 

1. We "add" to the decomposition all the new edges that cross path boundaries, which results 
in a decomposition to proper sub-paths. 
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2. For every resulting sub-path we recursively construct a new decomposition. 

We now formally define the concepts decomposition and recursive decomposition: 

Definition 5.3. Given the complex C = (B, Q), a separating set for C is a set S of vertices satisfying 
S C U qeQ V[q], and deg G s(v) > 3 for all v G S. 

Definition 5.4. Given the complex C = (B, Q) and a separating set S for C, we define the 
decomposition T> = (S, V) of C as follows: 

Fix q = {vi,v 2 , ■ ■ ■ ,Vk) G Q. Let S n V[g] = {t^,^, . . . where if. > ie-i, for 2 < ^ < j. 

Define the paths 

Pi = (v 1 ,v 2 ,...,v il - 1 ), 

P2 = (v il+1 ,v h+ 2, - ■ ■ ,Vi 2 -l), 

Pj = K_i+1> Vij-i+2, Vij-l), 
Pj+l = (^-+1,^+2, 

Let V(q) be the set of all the non-empty pi paths, 1 < £ < j + 1. Let be the union of all 
q G Q. S is called the separating set of T>, and it is denoted by S[T>]. 

A proper decomposition (S, V) is a decomposition in which all paths p G V are proper paths. 

Definition 5.5. Given a proper path p = {v±,V2 ■ ■ ■ ,Vk), a non-empty set S C V[p] is called a 
separating set for p if S* = {w^,^, . . . , v^}, where if > i^\ for 2 < £ < j, satisfying: 

1. For 1 < £ < j: 

(a) deg G s(v ie ) > 3, or 

(b) deg G s(vi e ) = 2 and there is no edge between the sets {v\, . . . , and {w^+i, • • • , ffe} 
in G s . 

2. For 1 < £ < j — 1, if < i£ + i — 1, then both deg Gs (^) > 3 and deg G s (vi e+l ) > 3. 

3. If i\ > 1 then deg G s(wj 1 ) > 3, and if ij < k then deg G s (vj) > 3. 

Definition 5.6. Given a proper path p = (v\ , v 2 ■ ■ ■ , Vk), and a separating set for p, S = {v^ , v j 2 , . . . , i 
(«!<...< ij), the decomposition T> = (S,V) of p is defined as follows: Let pi, . . . ,Pj+i denote the 
paths 

Pi = (v 1 ,v 2 ,...,v il - 1 ), 

P2 = {v il+1 ,v il+2 ,...,v i2 -i), 



Pj = {Vi j _ l +l,V ij _ l+2 ,...,V ij - 1 ), 

Pj+i = (v ij+ i,v ij+ 2, - ■ ■ ,Vk)- 



Let V be the set of all the non-empty pe paths, 1 < £ < j + 1. S is also called the separating set of 
V, and it is denoted by S[V] or S[p] if V is clear from the context. 

A proper decomposition (S, V) is a decomposition in which all paths p G are proper paths. 
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A key point in the above definitions is the allowance for vertices of degree 2 to be in the 
separating set of a decomposition of a proper path (Def. 15, 5j) . under certain restrictions. This 
solves the problem imposed in the third example in Fig. ^ The odd subpaths there can be now 
part of the separating set, and not part of the paths of the decomposition. 

Definition 5.7. A recursive decomposition T> R (q) of a proper path q is a proper decomposition 
V = (S,V) of q, along with recursive decompositions V R (p), for each p E V. We define the value 
of T> R (q) recursively as 

v(V R (q)) =log\q\ + ^v(V R (p)). 

pev 

Definition 5.8. A recursive decomposition T> R (C) of the complex C = (B, Q) is a proper decompo- 
sition T> = (S,V) of C = (B, Q) along with recursive decompositions T> R (p), for each proper path 
p € V. The value of V R (C) is defined as 

Hv r (c)) = Y j H^ r (p))- 

pev 

We use the shorthand T> R when the complex C or the proper path q is implicitly understood. 

Definition 5.9. We define V[D R ] to be the set of all sub-paths in the recursive decomposition T> R , 
including sub-paths defined recursively. We also define Vg[D R ] = {p£ V\D R \ : \p\ > g}, and 

v g (V R )= (log M- l°gs)- 

P £Vg{T>R] 

Proposition 5.10. Let V R = V R (C) be a recursive decomposition of the complex C = (B, Q), and 
let (So,Vq) be the top level proper decomposition of C in T> R . Then the following hold: 

1. Vp£V[V R ], S nS[p] = 0. 

2. Vp, q G V[V R ], ifp / q then S[p] n S[q] = 0. 
3- S U\J peP[vR] S\p]=[j qeQ V[q}. 

I v(V R ) = ^ peV[vR] log\p\. 

5. vi{V R ) = v{V R ). 

6. \P{V R )\ <£ + t, where £ = \{v G U q€Q V[q] : deg G s(v) > 3}|, and t = \Q\. 

Proof. Items 1-5 follow immediately from the definitions. To prove 6 we first argue that for any 
proper path p, \V(T> R (p))\ < £+1, where £ is the number of vertices of degree 3 or more in p. Next, 
we sum up over all p € Vo, getting an extra t. □ 

Example 5.11. Consider the following (maybe the simplest) recursive decomposition of the com- 
plex C = (B, Q): The first level consists of a separating set that includes all the vertices on the 
paths in Q with degree at least 3. The rest of the vertices have degree 2 and are grouped into sub- 
paths (which are proper). In the second level of the recursive decomposition, each such sub-path is 
decomposed so that all its vertices are in the separating set. As we shall prove in Lemma 15.121 any 
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recursive decomposition of the complex, implies an upper bound on fa. However, the upper bound 
implied by this recursive decomposition is not tight (see the discussion in Section 15.11 about the 
third case in Fig. and we need the full generality of the definition of recursive decomposition in 
order to create a recursive decomposition that implies a tight upper bound on fy. 

5.3 The Proof 
The Upper Bound 

Lemma 5.12. Let C = (B, Q), \Q\ = t, be the complex in G s induced by Dto at the end of sub- 
phase II. Let T> R be a recursive decomposition of the complex C. Let g be the number of new pages 
in the phase, and t be the number of vertices in IJqeQ^ with degree at least 3 in G s . Then, 

h = 0[g{i + t + v g {V R ))). 

Proof. There are at most g faults not on the paths in Q (i.e., faults on new vertices requested for 
the first time in the phase during sub-phase III). 

We count the number of faults on the paths by charging faults to vertices of degree > 3 in G s 
or to paths in the recursive decomposition of the complex. 

If the fault is on a vertex of degree > 3 in G s , then we charge it to the vertex. There are at 
most I such faults. 

Otherwise, the fault is on a vertex v of degree 2 in G s . From Proposition 15.11)1 v must be in 
some S\p] for some path p E V(T> R ). It cannot be in the separating set of the complex itself because 
all vertices in the separating set of the complex are of degree > 3 in G s . 

We charge the path p for the fault on v. We want to show that there are at most 0(g ■ 
maxjlog \p\ — log g, 1}) faults associated with p. If this is true then 

f 3 <g + £+ (H-i)+ Yl 0( 5 • (log H- iog 5 )) 

p&V{V R ),\p\<2g peV 2g (T> R ) 

<g + £ + (£ + t)2g + 0(gv g (V R )). 

and the proof of the lemma would be completed. 

Let S 2 (p) denote the set of all vertices of degree 2 in S\p\. Let U(p) C S 2 (p) denote the set of 
unmarked vertices in S 2 (p). Over time, when unmarked vertices are requested, they are removed 
from U (p) . 

For X C p, let C(X) denotes the minimal sized subpath of p that contains all vertices in X. 
We use the notation C(U(p)) to denote a set whose size may decrease over time (as U(p) itself is 
a set whose size may decrease over time). 

Proposition 5.13. All vertices in C(U(p)) are unmarked. 

Proof. The proof follows from the fact that p is a proper path. Let u, v G S 2 (p), u / v, be arbitrary 
distinct vertices in S 2 (p). Assume that w € C({u,v}) and x C({u,v}). Then any path from x 
to w must pass through either u or v. Thus, if any vertex in C({u,v}) was requested this implies 
that either u or v was requested, i.e., either u or v is marked. 

Taking u and v to be the extreme points of C(U(p)) (which must also be in S 2 (p)) concludes 
the proof of the claim. □ 
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Let q £ Q be a path in C that contains p as a sub-path. Let M(U(p)) denote the longest 
unmarked subpath of q € Q containing C(U(p)), so M(U(p)) also varies over time. When Dto 
evicts a vertex from M(U(p)), it is the closest stale page to the midpoint of M(U{p)). As there are 
at most g non-stale pages in M(U(p)), the evicted page is at distance at most g from the midpoint 
of M(U(p)). 

Proposition 5.14. After g + 1 evictions from S 2 (p), \M(U(p))\ < 2(\p\ + g) 

Proof. We first claim that after any fault in S 2 (p), p includes at least one of the extreme points of 
M{U{p)). This follows because C(U(p)) C M(U(p)) C q, C{U{p)) CpC g , so pfl M(U(p)) + 0. 
As p is a subpath of q and M(U(p)) is a subpath of q, either p is a subpath of M(U(p)) or an 
extreme point of M(U(p)) is in p. p cannot be a subpath of M(U{p)) because p contains a marked 
vertex whereas M(U(p)) consists only of unmarked vertices. Thus, it must be that an extreme 
point of M(U(p)) is in p. 

There must be at least one fault in S 2 (p) prior to the (g + 1) eviction from S 2 (p). On the next 
eviction from S 2 (p) following the first fault from S 2 (p), we know a vertex of distance at most g 
from the midpoint of M(U(p)) is in S 2 (p) C p, whereas one endpoint of M(U(p)) is in p, thus 
\M(U(p))\ <2(\p\+g). ' ' □ 

The g + 1 evictions from S 2 (p) described in the lemma above can cause at most g + 1 faults in 
S 2 (p) . This means that we have associated at most g + 1 faults with p prior to the configuration 
where \M(U{p))\ < 2(\p\ + g). 

We now count the number of evictions from this point onwards, this is a bound on the number 
of faults in S 2 (p). 

After every g evictions, the size of M(U(p)) decreases by a factor of roughly 1/2. So long as 
\M(U(p))\ > Wg, this factor is at most 6/10. Thus, after 0(g(log \p\ — log(lOg))) evictions we have 
\M(U(p))\ < Wg. On the remaining vertices we can fault at most Wg times, giving us a total 
number of faults on this stage of 0(g ■ maxjlog \p\ — logg, 1}). □ 

The Lower Bound 

The idea is to construct a vine decomposition V = (B, Q) on k + 1 vertices (see Definition 12.5(1 
whose value matches the value of some recursive decomposition T> R , up to a constant factor: 

u(V) > ■ v{V R ). (2) 

If this is true, then from Lemma 15.121 Lemma 12.31 Proposition 12.41 and Lemma 12.61 it follows 
that the competitive ratio that the adversary can force upon any online algorithm is no worse than 
the competitive ratio of Dto. 

Consider a decomposition V = (S, V) (this decomposition is either a decomposition of a proper 
path or of a complex). As a first step towards obtaining our goal of the previous paragraph, we 
seek a vine-decomposition V = (B,V') such that 

1. The set of paths in V' is a subset of the set of paths in V . 

2. The backbone of the vine decomposition V includes the separating set S of T> and includes 
the paths ofV\V. 

3. The value of V is no less than a constant fraction of the value of T>. 
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Definition 5.15. A proper vine decomposition V of a proper path p is a vine decomposition of the 
graph induced by G s on the vertices of p, such that the paths of V are sub-paths of p, and the 
endpoints of p are in the backbone of V. 

Definition 5.16. A vine selection of a decomposition T> = (S,V) of a proper path q is a set 
A(V) = V C V (called vines) such that 

• The induced graph on S U Upe-p\A(x>) V\p] 1S connected. 

• T, P eA(V)) lo § bl ^ c E pe p lo g bl> where c = 1/32. 

• The endpoints of q are in S U \J p& -p\A(v) V\p\- 

It follows that (S U Upe(p\A(£>)) ^b]> )) is a proper vine decomposition of g. 

Definition 5.17. A vine selection of a decomposition T> = (S,V) of a complex C = {B,Q = 
{qi, q t }) is a set A(V) = V CP such that 

• The induced graph on Upe-p\A(£>) U U & * s connected. 

• EpeA(x-) lo § bl ^ c EpeP lo S bl> where c = V 32 - 

It follows that (BUS'U UpeCP\A(V)) ^b]> ^(^)) i s a vme decomposition of G s . 

To construct a vine decomposition as required in (0), we make use of a special type of decom- 
position, called an irreducible decomposition. 

Definition 5.18. Given a path p = (vi, . . . ,v m ), denote the span of an edge e = ViVj between 
two vertices in p by \e\ p = \j — i\. An irreducible path p is a proper path in which for every edge 
e whose endpoints are in p, |eL < b) 1 ^ 4 - An irreducible decomposition is a proper decomposition 
V = (S, V) such that all paths p G V are irreducible. 

The following Lemma shows that it is possible to construct an irreducible decomposition along 
with a corresponding vine selection. 

Given a simple path p = {v±, V2, ■ ■ ■ ,V)-}, a maximal subpath of degree-2 vertices in p is a subpath 
q = (vi,Vi+i, . . . ,Vj), 1 < i < j < k, such that the deg G a(vi) = 2 for all i < I < j while if i > 1 
then deg G s(vj_i) > 2, and if < k then deg G s(fj + i) > 2. 

Lemma 5.19. 

Given a proper path q and assuming that every maximal subpath of degree-2 vertices has at 
least 15 vertices, then q has an irreducible decomposition D = (S, V) and a corresponding vine 
selection A(D). 

2. Given a complex C = (B, Q), and assuming that for allq € Q every maximal subpath of degree- 
2 vertices of q has at least 15 vertices, then C has an irreducible decomposition D = (S, V) 
and a corresponding vine selection A(D). 

The proof of the lemma appears in Section 15.41 

We are now ready to construct the required recursive decomposition and vine decomposition, 
as required in Equation (j2j). We give a constructive algorithm that builds both simultaneously, the 
algorithm makes use of recursive decompositions. 
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1. Use Lemma l5.19l to obtain an irreducible decomposition T> = (S, V) and related vine selection 
A(V); 

2. Recursively find for every sub-path p E V, a vine decomposition and recursive decomposition; 

3. Using the resulting recursively obtained vine decompositions and the vine selection A(T>), we 
construct the required vine decomposition (Lemma 15.281 in Section [5. 4|) . 

Lemma I5.2UI summarizes the construction for proper paths, the construction for a complex is 
handled in Lemma 15.211 

Lemma 5.20. 3d, c > d > 0, such that any proper path q, with each maximal subpath of degree-2 
vertices having at least 15 vertices, has a recursive decomposition T> R along with a proper vine 
decomposition V of q such that v(V) > d ■ [v(T> ) — log \q\]. 

The proof appears in Section 15.41 Lemma I5.2UI is used in the following lemma to construct the 
vine-decomposition for the complex. 

Lemma 5.21. Given the complex C = (B, Q = {q±, . . . , qt}) in G s , let I be the number of vertices 
x ^ UoeQ V[q] suc h that deg G s(x) > 3. There exists a recursive decomposition T> R = T> R (C) and a 
vine- decomposition V of C such that m&x{£ + t, ^(V)} = VL{t + t + v(T> R )). 

Proof. First we change G s by adding new degree-2 vertices in such a way that every maximal 
subpath of degree-2 vertices is of length at least 15 (we need at most 14(7 + i) new vertices). Denote 
the resulting graph G'. The complex C in G s naturally induces a complex C = (B, Q' = {q[, . . . , q' t }) 
in G' . From Lemma 15.191 we have an irreducible decomposition V = (S,V) of the complex, along 
with a vine selection A(T>'). The vine selection A(T>') induces a vine decomposition, V[, on G' such 
that v{V[) >c£ pep log \p\. 

From Lemma l5.20| we have for each path p E V, a recursive decomposition T> R (p) and a proper 
vine decomposition V 2 (jp) such that ^(V^p)) > d[v(V R (p)) — log 

We construct a new vine decomposition V' 2 for G' . The set of paths in V 2 is the union of the 
paths in all the \J pe -p V 2 (p)- The backbone of V 2 is the union of the sets B, {B' p } pe -p, and S, where 
B is the backbone of C, B' p is the backbone of the proper vine decomposition of V 2 (p), p£?, and 
S is the separating set of V. 

We show that V 2 is indeed a vine decomposition by showing that the backbone is connected 
and that all the paths of V' 2 are adjacent to the backbone at their endpoints. 

Consider a path q = (vi, . . . ,v r ) E Q. We shall see that if q n S is connected to B in the 
backbone of V 2 , whenever q n S is non-empty. Let q fl S = {v^, . . . ,Vi s }, ordered in their order on 
q. Let iq = 0, and Vi E B a vertex in B adjacent to v\. We prove that for all j E {0, . . . , s — 1}, 
Vi j and Vi- +1 are connected in the backbone of V' 2 . Note that either ij + 1 = ij+i, and in this case 
those two vertices are adjacent in q, or otherwise there is p E V a path of V between them. As 
V 2 (p) is a proper vine decomposition of p (see Def. 15.15(1 . the endpoints of p are in B' and thus 
connected in B' So v*. and v*.,, are also connected via B' 

V l J '3+1 v 

Within every path p E V we have a valid vine decomposition, whose backbone is connected to B 
via vertices in S, thus V' 2 is a legal vine decomposition. It also follows that f(V 2 ) = J2 P eV ^(^(p))- 

Let V 1 = V'i if f(V{) > v(y' 2 ), and let V' = V 2 otherwise. Let T>' R be a recursive decomposition 
of C defined in a natural way as V' R = (S, {V R (p) : p E V}). 
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v(V) = maxMV[), u(V' 2 )} > \v{V[) + \v{V 2 ) 

> ±c ]T log \p\ + \c £ [v(V R (p)) - log \p\] 
pev pev 

= lc'Yv(V R (p)) = ^v(V' R ) 
pev 

Define V R and V on G s , by removing from T>' R and V' the artificial vertices. T> R remains a 
recursive decomposition of C, V remains a vine decomposition of G s , and v(T>' R ) > v{T> R ). There 
are at most £ + t vines in V' and the value of every each is reduced by at most log 16 = 4, so 
u(V) > v(V) - A(£ + t). Therefore 

5(£ + t) + v(V) >{i + t) + v(V) >£ + t + ^v(V ,R ) = n(£ + t + v{V R )). □ 

The vine decomposition V in Lemma 15.211 is of G s , instead of Gin, and furthermore, might 
have more than k + 1 vertices. Thus we are not quite done yet. The following lemma allows us to 
reduce the number of vertices in V to k + 1, or to find a "big cycle" in it. The proof appears in 
Section 15.41 

Lemma 5.22. Given a complex C = (B\, Q) on the vertex set V and a vine decomposition V = 
(E>2,T > ) of C and an integer h > 0, one of the following holds: 

1. maXggQ \q\ < 8(h + 1). 

2. Bp € V, such that log \p\ > u(V)/24. 

3. There exists a set of vertices T C U ge gg and vine decomposition V' on the vertex set V \ T, 
such that \T\ > h and z^(V') > i^(V)/24. Furthermore, for each q G Q, T n q is a subpath of 
q, and at least one of its endpoints is adjacent to the backbone ofV'. 

We are ready to conclude Lemma 14.21 

Proof of Lemma \4-.2\ At the end of sub-phase II, (B, II) is a vine decomposition. (B, II) induces a 
complex (B, {qi, . . . , q t }) on the simplification graph G s . 

Let V R = V R (C) be the recursive decomposition C = (B, Q) and V the vine decomposition of 
C obtained by Lemma 15.211 From Lemma 15.121 we know that f$ = 0(g(£ + t + v g (T> R ))). Hence, 
we are left to prove that £ + t + v g {V R ) = 0(r°°(G m , k)). 

First we observe that £ = 0(r°°(Gin,k))- This is true since by Proposition 12.41 there exists a 
tree on k + 1 vertices with £l(£) leaves, so by Lemma 12.31 £ = 0(r°°(Gni, k)). 

Next, we observe that t = 0(r°°(Giii, k)). Indeed, when disconnecting the paths qi,. . . ,qt at 
their midpoints, and by removing g — 1 vertices from these paths, we get a subgraph on k + 1 
vertices with at least t/2 leaves, so t = 0(r°°(Giii, k)). (Note that the case Ylili < %g is easy). 

We are left to prove that v g {V R ) = 0(r co {G IIIl k)). If v(V) < £ + t, then by Lemma l5~2"Tl 
v g (V R ) < v(V R ) = 0(£ + t) = 0(r°°(Gjjj, k)), and we are done. Otherwise, we apply Lemma 15.221 
with h = g — 1 on the vine decomposition V in the complex C . One of the following must happen. 

1. If max geQ \q\ < 8h, then v g {V R ) < \V g [V R ]\{\ogSg - logs) = 0(£ + t) = 0{r co {G III ,k)). 
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2. If Bp G V such that log \p\ > z/(V)/24, then let gi G Q be the longest path in C, so \q\\ > \p\, 
and log |gi | > log \p\ > z/(V)/24 > dv{T> R ), for some global constant d > 0. 

If \V g {V R )\ > \ then 

v g (V R ) <v(V R )- \V g (V R )\ -logs < 5(2)*) -£-logg< i(log|gi|-log 5 ). 
If, on the other hand, \V g (V R )\ < i, then 

In either case, v g (V R ) = 0(log log 5). From Lemma l2.71 log | gi | — logg = 0(r°°(Gui,k)), 
thus v g (V R ) = 0(r°°(Gju,k))). 

3. There exists T, a set of vertices in G s , \T\ > g — 1, and a vine decomposition V' on the set of 
vertices of G s \ T such that v{V') > f(V)/24. V' is a vine decomposition in G s . We transform 
it to a vine decomposition in Gjjj as follows. First, each edge of the backbone of V that does 
not appear in Gm is replaced with a path of vertices in V[Gih] \ V[Gn] that realizes this 
edge. The resulting vine decomposition still has at most k + 1 vertices. Next we augment 
this vine decomposition to have exactly k + 1 vertices, by adding vertices removed from V', 
back to the backbone. This is done by adding vertices from T to the backbone in the exact 
amount to reach k + 1 in the vine-decomposition. Since the vertices of T form subpaths in 
the paths of the complex C that at least one of their endpoints adjacent to the backbone of 
V, we can add them in such away that the augmented backbone remains connected. 

The resulting vine decomposition V" has the same value as V. From Lemma 12.61 v(V") = 
0(r°°{G ni , k)). Thus, v g {V R ) < v{V R ) = 0{u{V)) = 0{u{V")) = 0(r°°{G ni , k)). □ 

5.4 Proofs of the Combinatorial Lemmas 

In this section we supply the proofs omitted from the previous section. 

Proof of Lemma 15.191 

Constructing the irreducible decomposition is done using refinements. 

Definition 5.23. A refinement of a decomposition T> = (S,V) is a decomposition V = (S',V) of 
the same object such that S C S' , and the paths in V' are sub-paths of the paths in V. 

The next two lemmas show how to construct an irreducible decomposition along with a corre- 
sponding vine selection. 

Lemma 5.24. For every proper path p there exists a decomposition T> = (S, V), and a vine selection 
A(D), such that S includes the endpoints of p. 2 

2 T> is not necessarily a proper decomposition. Properness will be dealt in Lemma 15.251 
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A "sub-backbone' 



h-l r i-l k 




An edge in E' 



Case 2: r, 





' . 2 e; 




C7 



Case 1: r^_i > U 



k - max{!,_i,r ! _ 2 } > r<_i - 



^ - ri_ 2 > ri_i - /, 



Figure 2: The different cases in the proof of Lemma 15,241 In case 1 the "sub-backbone" path is 
chosen to be smaller than an adjacent vine. In case 2 the "sub-backbone" path is chosen to be 
(v n _ 1+ i,v h -i), which is smaller than (v inax { r ._ 2|ij _ 1 }+:i., • • • «ri_i)- 

Proof. Let p = . . . , mj i). 5 is constructed in two stages. A path-edge ito+i is called 
"covered" if there exists an edge e = VjV\ such that j<i<i + l<l and {j, 1} ^ + 1}. 
Remove the uncovered edges in p, and consider the resulting connected components of the graph 
induced on p. Note that these connected components are sub-paths of p. Singletons are put in S. 
We consider each non-singleton sub-path individually, finding a decomposition and a vine-selection 
for each. Combining the decompositions and vine selections for the sub paths gives the required 
result. Note that each such subpath is a proper subpath and any of its path-edges is "covered" by 
an edge whose endpoints are in that sub-path. 

From now on, assume p' = (v±, . . . ,v m ) is a proper path, whose path-edges are all covered by 
some edge whose endpoints are in V\p']. 

We choose a set of edges E' = {ei, . . . , e m r} by induction as follows: let l\ = 1, e\ = %u n 
for the maximum available r\. Assume inductively that e\ = vi ± v ri , . . . , e%-\ = V[ i _ 1 v ri _ 1 , where 
r.;_i < m, were already chosen. Consider the edge e = viv r such that I < r^-i and r is maximized 
under this condition. Note that r > r^_i since otherwise v ri _ 1 v r ._ 1 +i is not covered. We consider 
two cases: 

1. If rj_i — I < r — Ti—i, then we set = e. 

2. Otherwise, consider the edge e' = Wi'W r i such that V < r and r' is maximized under this 
condition. Note that r' > r. We set = e' . 

We continue until v m is reached, and let r m i = m. It is easily checked that (1 < i < m'), 
rj_i < (taking r$ = 1, l m > + % = m), and that both (Zj)j and {r{)i are strictly increasing sequences. 

We construct T> by adding to S the endpoints of the edges in E' . Let p[, ■ ■ ■ ,p' s , denote the 
resulting decomposed sub-paths, in their order in p'. The vine selection A(D) is constructed by 
taking roughly half of the sub-paths as vines, as described below (see also Fig. |2J). For 2 < i < m': 

1. If Ti-i > h, let h = 1 + max{/j_i, rj_2}. We declare the shorter of the two sub-paths 
(vh, . . . ,vi-_i) and (^+1, • • • , Wri_i-i) as a "sub-backbone", i.e., it will not be part of A(T>). 
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2. If rj_i < li, we declare the sub-path (v n _ 1+ i, . . . , u^-i) as "a sub-backbone". 

A(T>) consists of all the other sub-paths, not declared as "sub-backbones". It is easily checked 
that the sub graph induced on 5 U U p '0 J 4(x>)K ^ s connected and adjacent to the endpoints of p[, 
p'i^A{V). 

In case (0) The sub-path not declared as sub-backbone is in A(T>) and is bigger. In case @, 
the sub-path {v^, . . . , v ri _i-l) is in A(V), and is bigger than the sub-backbone {v ri _ 1+ i, . . . , v^i). 
Hence every sub-path not selected to A(T> ), has an adjacent sub-path in A(T>) which is bigger. This 
means that we have constructed a mapping / : {p'i,p' 2 , ■ ■ ■ iP' s >} A(V), such that \f~ 1 ({p' i })\ < 3 
for all p\ G A(V), and \p^\ < for all? E {1, . . . , s'}. Therefore 

1 s ' 

£ log(|^| + l)>-^log(|^| + l). 

p'i&A(V) i=l 

□ 

Lemma 5.25. Given an initial decomposition V 1 = (S^V 1 ), in which every maximal path of 
degree-two vertices in V 1 contains at least 15 vertices, and a vine selection AiV 1 ). Then there 
exists a refinement irreducible decomposition T> = (S, V) and a vine selection A{T>) . 

Note: The lemma holds, both for a decomposition of the complex, and for a decomposition 
of a proper path. We denote by B the backbone of the complex and we assume B = for a 
decomposition of a proper path. 

Proof. The proof is by induction. Given a decomposition T> = (S, V) and a vine selection A(T>) 
such that ^ p gA(I3) 1°§ \p\ — c S p e"pl°S \p\> ^ ^ 1S an irreducible decomposition then we are done. 
Otherwise, there exists an edge e = uv that contradicts T> being irreducible. We call e a violating 
edge. 

In this case we build a refinement V = (£", V') from T> as follows. 

• S' = SU({u,v}\B). 

• If u, v € p, for some p € V, we decompose p to three sub-paths: p\, p 2 and ^3. Then define 
V = VU{ Pl ,p 2 ,p 3 }\{p}. 

• If u € p, v 6 q, such that p,q £ V and p ^ q, we decompose p to two subpaths p\ and p 2 and 
q to two sub-paths qi, and 52- Then define V' = V U {pi,£>2, 9i 3 92} \ q}- 

T>' is obviously a legal decomposition. Moreover, the set of violating edges decreased (with e 
becoming a non- violating edge). Therefore, this process is finite, and at the end we get an irreducible 
decomposition. 

We are left to find a vine selection A (£>') for V such that 

log|p'|>c ^ log|p'|. (3) 

P 'eA(v) p'eV 
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Figure 3: The different cases in Lemma 15.251 In case 2, the edge e = uv is between a path and 
the backbone. In case 3, e is between two different paths. In case 4, e is inside the same path. 
The different sub-cases corresponds to the different possibilities of the paths to be either in the 
vine-selection, or as sub-backbones. 



Denote 

a = l °g\p'\ - X] log i p i' 

p'e-p' pev 

A = ^log|p'| -^log|p|. 

p'eA{V) P eA{V) 

Note that A > 0. We will see how to choose A(V) such that A > cA (proving equation ©), and 
the sub graph spanned by B U S' U \J p i^A(V) p' remains connected. 

We denote by B(V) the subgraph induced on B U S U V} p ^v\A{v)P^ anc ^ B(V) the subgraph 
induced on B U S' U \J p i e pi\A(v')P' ■ I- e --> B(D) is the backbone of the vine decomposition in the 
old decomposition, and B(T>') is the backbone of the vine decomposition after the refinement. 

We use the notation rii = From the assumptions of the lemma, ni > 16, and we make use 
of fact that for n > 16, log n — 2 > i log n. 

We do a case analysis according to the places of u and v. The different cases are also illustrated 
in Fig. G3 
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Both u and v are not on paths in V (they are in S U B). In this case e is not a violating edge, 
which is impossible. 

One of u or v is on a path in T> and the other is not. In this case {V'l = \V\ + 1. Assume 
v G p, for some p G V, and -u G SUB, so p = U {v } Up 2 > where 7/ 1; p 2 £ T 3 '- As |p| = ni + n 2 
(since I • I is defined to be the number of vertices plus one), A = log ni " 2 . 

(a) If p G A(£>), then we construct A(P') = A(£>) U {pi,p' 2 } \ {p}- B(P') is connected, and 
A = log -™iB2_ so A > A. 

(b) If p G" ^4(1}), then there is a simple path in B{T>) between u and v. Without loss of 
generality, assume the path passes through p' 2 . Construct A(V) = A(T>) U {p' 2 } \ {p}. 
B(V') is connected, and A = logn-2, so A > logn 2 + log n ™+ n = A. 

u, v are on different paths, u G p and v G q, for some p,q G V and p ^ q. In this case 
\V\ = \V\ + 2. Assume p = p[ U {«} U p 2 and g = p' 3 U {«} U j/ 4 . Here A = log (ni ^"4"t W4) • 

(a) If p, q G A(T>). Without loss of generality assume that n\ = minjni, n 2 , n 3 , w 4 }. Con- 
struct A(V) = A(V) U {p 2 ,p 3 ,p 4 } \ {P,q}- Obviously B(V) is connected, and A = 
log (ni+ ^3 4 +n4) - Without loss of generality, assume n 3 < n 4 , so log {ni+ Z)^. i+ri4) > 
-2. Thus 

A = log n 3 + log n ^" 4 > log n 3 - 2 > \ log ra 3 > 

[n\ + n 2 )(n3 + n 4 ) 

l(logn 3 + logni) > j[logn 3 + logni + log r] = |A. 



(ni + n 2 ){n 3 + n 4 ) 



(b) If p G A(X>), and g G" A(X>) (The case g G A(£>) and p ^ A(V) is similar). Without loss 
of generality, n\ < n 2 . p^ has two endpoints, one of which is adjacent to u. The other 
endpoint of p[ is connected via a simple path in B(V) to v. This path must contain 
either p' 3 or p' 4 . Without loss of generality, assume it contains p' 3 . If n\ > n 3 then we 
construct A(T>') = A(V)U {p[, p' 2 }\{p} ■ If, on the other hand, n\ < n 3 then we construct 
A{V) = A{V) U {p' 3 ,p 2 } \ {p}- In either case, B{V) is connected, and 

max{m,n 3 } ■ n 2 n 2 

A = log = log maxjni , n 3 } + log > 

n\ + n 2 ni + n 2 

imax{logni,logn 3 } > |(logni + logn 3 ) > j(logni + logmin{n 3 , ra 4 }) > 



. n 2 max{n 3 ,n 4 } 
logni + iogmin{n 3 , n 4 | + log ■ 



(ni + n 2 )(n 3 + n 4 ) 



1A 



(c) If p, q G" A(D) . Then there is a simple path in -8(2?) from u to v. Without loss of 
generality, assume it passes through p^, and p 3 . Without loss of generality, assume 
n x < n 3 . Construct A{V) = A(V) U {p' 3 }. B(V) is connected, and 

A = logn 3 > ^ [logmin{m,n 2 } + log min{n 3 , ra 4 }] > ^A 

u,v are on the same path. In this case \V'\ = \V\ + 2. Assume p = p[ U {u} U p' 2 U {v} U p 3 , 
for some p G V and p[,p 2 ,p 3 G V . So A = log ^^^^ ■ Here e = uv does not contradict 
V being proper, implying that n 2 > («i + n 2 + n 3 ) 1//4 , so logn 2 > | logmaxjni, n 2 , n 3 }. 
Without loss of generality, assume that n\ < n 3 . 
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(a) If p e A(X>), then construct A(V) = A{V) U {p' 2 ,p' 3 } \ {p}. 

a , ™ 2 n 3 max{re 2 ,n 3 } 

A = log ■ ■ = log mm{n 2 , n 3 } + log ■ ■ > 

n± + n 2 + n 3 ni + n 2 + n 3 

> log min{n 2 , ra 3 } — log 3 > \ log min{n 2 , n 3 } > 

> ^(logni +logmin{n 2 ,n 3 }) > ^A 

(b) If p A(X>), then construct A(V) = A(V) U {p' 2 }. 

A = logn 2 > ^21ogmax{ni,n 2 ,n 3 } > |A 

□ 

We combine lemmas 15,241 and 15.251 

Proof of Lemma \5.1fA We first find an initial decomposition (not necessarily proper) and a corre- 
sponding initial vine selection as required by Lemma f5.25l For a proper path, Lemma [5.241 gives the 
needed initial decomposition and vine selection. For a complex (B, Q), we take (0, Q) as the initial 
decomposition, and Q as the corresponding initial vine selection. Using Lemma 15.251 we obtain an 
irreducible decomposition and a corresponding vine selection. □ 

Proof of Lemma 15.201 

We need the following definitions: 

Definition 5.26. Define a sequence (c m ) mg N recursively as follows 

fc/4 m < 8 4 ; 



[ c \mV*] ■ [ l ~ ^37Te ) otherwise. 

We define a "good" vine decomposition V of a proper path q related to a recursive decomposition 
V R (q), to be a proper vine decomposition of q such that 

• For every path p G V, \p\ < M 1 / 4 - 

• v{V)>c H - [«(©*(<?))- log |g|]. 

Proposition 5.27. There exists c' > such that for all m, c m > d . 

Proof. Define c'j = f l\l=2( 1 ~ 8 (4») 5 3 /i 6 )> and c ' = limj— oo^-- It; is easily seen that (c m ) m>0 
and (c^)j>o are non-increasing and c g4J = c^-. We have c' > 0, since 1 > g ( 4 iy 3/ie > and 

££i 8( 4») 5 3 / 16 < 00 ■ D 

Thus, constructing a recursive irreducible decomposition along with "good" vine decomposition 
would suffice to prove Lemma l5.20l The following lemma is the inductive argument that allows the 
construction. 

Lemma 5.28. Let V R {q) = (5, {D R {pi), . . . , V R (p s )}) be a recursive decomposition of a proper 
path q such that T> = (S, {pi, . . . ,p s }) is an irreducible decomposition. Assume that: 
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• There is a vine selection A(T>) ofT>. 

• For every sub-path pi, there is a "good" vine decomposition V(pt) related to T> R {pi). 
Then we can construct V, a "good" vine decomposition related to D R (q). 

Proof. Denote by TOj = \pt\ and M = \q\. From the irreducibility property of V R (q), m« < M 1//4 . 
For every p € A(T>) we do the following: 

Let m = \p\. From the assumptions, the vines in V(p) = (B(p),V(p)) have lengths at most to 1 / 4 . 
We partition p to t = |"m/(5m^ 4 )] = [~m 3//4 /5] consecutive sub-paths C\, . . . ,Ct, each of size 
between (5 — 2)to 1//4 and (5 + 2)m 1//4 , such that no sub-path partitions a vine in V(p). Let be 
the sum of the values of the vines in C{. Thus Yli-^i = u 0^ip))- Let C = be the sub-path 
that satisfies X = = minjjXj}, so X < m f/ 4 v(V{p)). S is partitioned into three consecutive 
sub-paths c lft ,C mid ,C rgt such that |C mid | = m 1 / 4 , and |C lft | > m 1 / 4 , |C rgt | > to 1 / 4 . 

Let p lft , p rgt denotes the sub-paths p that are to the left and to the right (respectively) of C mid 
in p. Let B{p) denotes the backbone of V{p). We construct V lft , a vine decomposition of p by 
defining B m , the backbone of V lf \ to be B m = (B(p) U C lft ) (~lp lft . We claim that V lft is a legal vine 
decomposition, and that the endpoints of p lft are part of B m . It is easy to see that the endpoints 
of the vines in V are adjacent to B and the endpoints of p lft are part of 5 lft . 

We are left to prove that the sub-graph spanned by B lit is connected. First, note that 3vi € C lft n 
B(p), otherwise C lft is part of a vine in V(p) and its length is at least m 1//4 , which contradicts the 
assumptions. The sub-graph induced on C lft is obviously connected. For every vertex u € £> lft \C lft , 
there is a path r in B(p) from u to V[. Consider the maximal prefix of r which is entirely inside 
p lft . The last vertex of the maximal prefix of r must be inside C lft , otherwise there is an edge e in 
p such that \e\ p > |C lft | = m 1 / 4 , in contradiction to the irreducibility property of T>. Thus -B lft is 
connected. 

In conclusion, V lft is a legal vine decomposition of p lft . Similarly, we construct a vine decom- 
position V rgt of p rgt such that the endpoints of p rgt are part of the backbone of V rgt . 

We construct V, a vine decomposition of q such that every vertex in a sub-path p A(T> ) will 
have the same role (part of the backbone or part of a vine) as in V(p). For p £ A(T> ) we declare 
C mid (p) to be a vine, and the other vertices, to have the same role as in V lft (p) and V rgt (p). V is 
a vine decomposition of q as it is seen from the previous discussion and because the endpoints of 
the sub paths are part of the backbone (see Def. I5.l5|) . Also, every vine of V has length less than 

Because c m i is non-increasing as a function of m', cm = m ^ n m '<M 1 / i c m'(l — m ,|/ 4 ). Let m = \p\. 
From the construction, for every p E A(T> ), 



u(V m (p)) + u(V^(p)) > (l - J^) u(V(p)) > 



> ( I - J c m {v{V R (p)) - logm) > c M {v{V R {p)) - log to) . 
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We conclude 



KV)> Yl i log |p| + Y, c M [v{V R (p))-\og\p\} + 
peA(D) peA(v) 

+ Y, [v{V R {p))-\og\p\] 



> CM 



^io g \pi\ + Y [sCp*(k)) - log bi|] 



.1=1 



i=l 



i=l 

Proof of Lemma \5.2(K We show the existence of a recursive decomposition and "good" vine decom- 
position by induction on the structure of the recursive decomposition. 

The base of the induction are simple paths whose inner vertices do not have other adjacent 
vertices. Such a path has a unique recursive decomposition, namely all the vertices are in the 
separating set of the decomposition. The "good" vine decomposition is also obvious: all the 
vertices are in the backbone. 

Otherwise, we use Lemma 15.191 to get an irreducible decomposition T> = (S, V) and a vine 
selection A(T>). Inductively, Each p G V has a recursive decomposition T> R (p) along with a "good" 
vine decomposition V(p). Hence, all the assumptions of Lemma 15.281 are satisfied, and we get a 
recursive decomposition along with a "good" vine decomposition for the given proper path. □ 



Proof of Lemma 15.221 

Proof of Lemma \5.2°A Assume that the first two cases in the statement of the Lemma do not hold. 
Denote Q = {qi, . . . ,q t }. We divide the vertices of qxU ■ ■ -Uqt between four sets X±, X2, X3 and X4 
as follows: Let vi, V2, ■ ■ ■ ,v m be the vertices of q\ U q?, U ■ ■ ■ U q% ordered as the order in q%, q2, ■ ■ ■ , qt- 
Let to = 1 and t± = m. For i G {1, 2, 3}, let £, be the smallest index such that ij > ti-\ + h + 1 and 
fj. is not on a path of V. Now we assign = ■ ■ ■ , v^}. Note that the vertices , vt 2 , Vt 3 

might appear in two consecutive sets. 

All four sets have at least h unique vertices, unless there exists a pathp G V such that \p\ > h+1. 
In this case we construct V' from V by removing the path p. Here v{V') = z'(V) — log \p\ > §§^(V)). 

From here on, we assume all four sets has at least h unique vertices. We denote by v{Xi) = 
X^plog \p\ where p ranges over the paths in V contained in Xj. Thus Ylt=i = v(V). Assume 

that v{Xi 4 ) = maxj< 4 z^(Xj), so v(X i4 ) > i^(V)/4. 

For a vertex u G Aj 4 n B2 , define 

r 3r = (u,wi,...,w s ,v) apathin 

C u = iv e{q 1 U---Uq t )\X iA : J. 

and {wi, . . . ,w s ) C A i4 nB 2 J 

Note that C u 7^ 0. We say that u G X,- LA n £?2 is locally connected to ^4 if C u C A Let ^(j < 3) 
be the set of paths of V contained in Xj 4 having at least one of their endpoints locally connected 
to Xi,. Then v{Yix) + v (Xii) + ^0^3 ) ^ 2f(Xj 4 ), since each path of T 5 contained in Xj 4 is counted 
at most twice. Assume that has the minimal value of the three, so viY^) < |z/(ATj 4 ). 

We remove from V all the unique vertices of X{ x . Note that there are indeed enough vertices 
to remove since X^ alone contains at least h vertices that can be removed. 
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Next, We add to the backbone the following vertices: (i) The vertices of the vines in ; and (hi) 
all the vertices in Xi 2 and Aj 3 to the backbone. The resulting is a "vine decomposition" V = (B, V) 
that the endpoints of its vines are all part of its backbone, and ^(V) > ^v(Xi 4 ) > ^i/(V). 

However, B might be disconnected. It consists of at most 3 connected components: (i) One that 
contains B\, (ii) one that contains Xj 2 , and (iii) one that contains Aj 3 . A more careful observation 
reveals that there might be at most 2 connected components: Assume Xi 2 is disconnected from 
B\ in B. In this case Xi 2 is contained in one path qi € Q. On one side of Xi 2 (in qi ), there are 
vertices from X^, and in the other side, vertices from Xi 4 . Now, either Xi 3 contains an endpoint 
of one of the paths of Q (in this case it is connected to B\) or it is adjacent to X{ 2 in q^ (in this 
case it is connected to Xi 2 ). In both cases we have only two connected components. 

In case B is disconnected we augment it as follows. Denote the connected components of B by 
C 5 Xi 2 and D D B\. On qi , Aj 4 n qi separates Xi 2 from D fl (qi U {(?i 's neighbors in B}). 

Assume that C is on the left side of Aj 4 n qi in q,i Q . Let u € C fl qi Q be the right most vertex 
of C in qi . u is either in X{ A or adjacent to Aj 4 . Let v £ D be the leftmost vertex of D that is to 
the right of u (in case no vertex of qi right of u is in D, we take v to be a vertex in B\ adjacent to 
the rightmost vertex in qi ). The vertices of qi between u and v form a path p € T 7 which is also 
a vine in V. We add them to the backbone of V. The result is the desired vine decomposition V', 
which has at least h vertices less than V, and 

v{V) > v(V) - log \p\ > ii/(V) - ii/(V) > i/(V)/24. 

We observe that from the way were formed, for any path q £ Q, q n Xj 4 is indeed a subpath, 
and one of its endpoints is adjacent to the backbone of V. □ 

6 Refined Locality of Reference 

Definition 6.1. An extended access graph on a set of pages V is a hnite labeled undirected graph 
G = (V, E, £), with a labeling function I : V — > "P that labels vertices with pages. 

A request sequence a = (£(vi))i>i, is a hnite sequence of labels attached to vertices from G, such 
that either Vi+i = i>j, or fjfj+i E E. A paging algorithm should maintain the invariant, that follow- 
ing the zth request, should be in some page slot. The competitive measures ro^(G, k), r(G, k), 
r ot>\(G,k) and their asymptotic counterparts are defined in a similar manner to the definitions for 
access graphs in Section 11.21 

For a given extended access graph G, we define a parameter A(G) that indicates "how quickly" 
the locality of reference may change G. 

Definition 6.2. 

A(G) = min{s — 1 : {v±,V2, ■ ■ ■ ,v s ) 6 paths(G), V\ ^ v s , and i{v\) = £(v s )}. 
As convention, when G is an (non extended) access graph, we fix A(G) = oo. 

We remark that paging algorithms get only the names of the requested pages and not the names 
of the vertices. However for A(G) > 2 and assuming the starting vertex is known, online algorithms 
can easily reconstruct the sequence of the requested vertices from the sequence of the requested 
pages. 



30 



6.1 Truly Online Algorithms 



Theorem n and Theorem 21 extend to the extended access graph model. 

Theorem 3. For any e > 0, and any extended access graph G with A(G) > (1 + e)k, 

r DTo(G, k) < max{0(r°°(G,A ; )),2/e} 
r Rm (G,k) < m^{0(r^(G,k)),2/e}. 

In particular, for any fixed a > 1, Dto and Rto are very strongly competitive for any G and k 
satisfying A(G) > ak. 

Proof. Let g denote the number of new pages in the current phase. We consider two cases: If 
g > ek then Dto and Rto fault at most k < | • ^ < | • | times in the phase. 

If, on the other hand, g < ek, then during the current phase and the previous phase there were 
no requests to two different vertices with the same label, so the graph Gni is an actual sub-graph 
of G. By the proofs of Theorem |2] and Theorem ^ Dto faults at most 0(g ■ r°°(G,k)) times and 
Rto faults at most 0(g ■ r??,(G, k)) times, in the phase. □ 

When A(G) is slightly greater than k, Dto and Rto do not work well, as is seen in the following 
example. 

Example 6.3. Consider the extended access graph G = (V, E, £), where V = {1,2, . . . , Ufc+i, Wfe+2}, 
E = {viVi + i : 1 < i < k + 1}, and 

oi \ \ i i < k + 1 
eiVi) = \l l = k + 2. 

Here A(G) = k + 1. Consider the request sequence a = J1J2J3 ■ ■ ■ where Jbi-i = 1,2, ... ,k and 
J%i = k+ 1, 1, k + 1, k, . . . , 2. Note that (Jj)j is the phase partitioning of a. Clearly, r(G, k) = 0(1). 
In contrast, Dto and Rto fault 0(log k) times [expected] during subphase III in each phase, and 
therefore r^ T0 (G, k) and r^ T0 (G,k) are SlQogk). 



6.2 Very strongly competitive algorithm for paths with k + 1 pages 

In this section we present a very strongly competitive algorithm for graphs that are simple paths 
on k + 1 pages. This algorithm will serve us in proving impossibility results concerning truly online 
algorithms. 

Given a finite simple path G = (v\, v%, . . . , v m ) with a surjective label function I : {vi : 1 < i < 
m} — > V, where \V\ = k + 1. Fix a vertex v {. We define a partial order -<i on V: p <% q if any path 
from Vi to a vertex labeled by q must include a vertex labeled by p. It is easy to verify that -<i is 
indeed a partial order. Let Mi be the set of maximal elements in -<i. Note that: 

1. Mi includes at most two pages that appear only on one side of Vi in G. 

2. Any request sequence that starts at Vi and accesses all the pages in Mi, must access all the 
pages in V . 
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We define linear orders on M[ = {p G Mi : p appears on both sides of vi in G}. For p,q £ M-, 
p -if q if when going in G from Vi to the left, we reach a vertex labeled with p before we reach a 
vertex labeled with q. Analogously, p -<f q is defined to the right. Note that p -<f" q if and only if 
Q <f V- 

Lemma 6.4. r%fo(G,k) = 0(max; log \Mi\). 

Proof. Let iq = argmaxj |Mj|, so |Mj | > 0. Using Yao's Principle (cf. Ch. 8]) we construct a 
probability distribution on the request sequences by the following iterative process: The request 
sequence is composed of periods, each period is composed of sub-periods. Let Nj be the set of 
-<j -maximal pages that were left unmarked before the jth sub-period begins. At the beginning of 
a period N\ = Mi . 

We describe the request sequence in the jth sub-period. The adversary begins from Vi . It 
chooses a page p € Nj such that p is smaller than |iVj|/2 + 1 pages of Nj under both -if and 
(its existence follows from a simple counting argument). If p appears on both sides of v j in G, then 
the adversary chooses uniformly at random the left or the right side, otherwise he chooses the side 
on which p appears. The adversary then requests the vertices in that direction until reaching the 
first vertex v labeled with p, and then returns to Uj . At this point the jth sub-period ends. Note 
that I^Vj+il > |-ZVj|/2 — 1. The adversary continues this way until Nj = 0, which means that all 
pages were requested during the period. At this point the period ends, and the adversary begins a 
new period. 

During a period, an optimal off-line algorithm faults at most twice, because the request sequence 
consists of at most two phases. 

Next we show that any online algorithm has ^(log |Mj |) expected number of faults of during a 
period. 

To prove it we argue that there are ^(log |Mj 1) sub-periods in a period and an online algorithm 
has an expected cost of at least half in each sub-period, except maybe two of them. There are at 
least O(log|.Mj |) sub-periods before Nj becomes empty set, because in each sub-period the size 
of Nj is roughly halved. In all sub-periods in which the target label p appears on both sides of 
Vi , the expected number of faults of the online algorithm is at least 1/2, since -<j -maximal pages 
that appear on both sides of Vi must appear, at least on one side of Uj , after the hole. As all 
-<i -maximal pages appear on both sides of Vi , except maybe two, the claim follows. □ 

We shall see now a very strongly competitive deterministic marking online algorithm called 
Maxfar. 

MAXFAR. Maxfar is a marking algorithm. Assume the current phase began at Uj. On the 
jth fault in the phase, let Nj C Mi be the set of unmarked -<j-maximal elements. Choose a page 
p £ Nj in the middle of Nj according to -<f (which is also in the middle according to -<f) and 
evict it. If Nj contains only pages that appear on only one side of Vi, evict one of them (there at 
most two such pages). 

It is easy to see that {Nj}j is a decreasing sequence of sets, and |iVj-+i| < |iVj|/2 + 1. Thus 
after at most j = 0(log |Mj |) faults, Nj = 0, and at this point, the phase is over. We conclude 
that rMAXFAR(G, k) = 0(maxj log |Mj|). Putting this together, 

Theorem 4. For any extended access graph G which is a simple path on k+1 pages, ?"maxfar(C, k) = 
0(r^(G,k)). 
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6.3 Impossibility Results 

In this section we show that any truly online algorithm can not be very strongly competitive on 
extended access graphs, when A(G) is slightly less than k. Formally, we prove that 

Theorem 5. 1. For any < / < k, and any deterministic truly on-line paging algorithm A, 
there exists an extended access graph G such that A(G) > k — f + 1 and Ta(G, k) > / but 
r(G,k) = O(logfc). In particular, for f = f(k) = u(logk), rA(G,k) = u(r(G,k)). 

2. For any < / < k, and any randomized truly on-line paging algorithm A, there exists an 
extended access graph G such that A(G) > k — f — 1 and ta{G, k) = r2(log f)> but r Q bi(G, k) = 
0(\ogk — log / + log log/). In particular, for f = f(k) = fc 1-0 ^ 1 ), ta(G, k) = uj{r \^\(G, k)). 

Proof. Fix 0</' = / + l<fc, and a truly online algorithm A. 

Consider the following part of an extended access graph labeled with k + 1 pages: 

1,2, ... ,k,k + l,xi,x 2 , ... ,Xf. 

The adversary maintains the invariant {xj : 1 < % < /'} = {1, . . . , /'}, but in some permutation 
that will be determined. 

To prove part (1), assume A is deterministic. We split the request sequence a into phases and 
show that each phase costs A at least /', whereas it costs the adversary only 1 (except the first phase, 
in which the cost for the adversary is k). We will also show that there exists an online algorithm 
that obtains a competitive ratio of 0(logk) on this graph. By making the request sequence long 
enough (for c phases where c satisfies k + (c — 1)/' > fc), the first part of the theorem will be 
proved. 

The adversary works in phases. Assume that in the previous phase the requested pages were 
{1, . . . ,k}. Each phase is composed of /' sub-phases, indexed by j 

For j = 0, the adversary requests xq = k + 1. As A is deterministic, the adversary knows what 
page has been evicted by A to satisfy this request. Denote it by po. In general, in each subphase, 
the adversary requests a hole of A, and therefore to serve the request, A must evict at least one 
page from its real memory. Denote a page evicted by A in the j th sub-phase by pj . 

In the jth sub-phase the adversary does as follows: If Pj-i > f then the adversary requests the 
pages Xj-i,Xj- 2 , ...,x Q = k+1, k, . . . ,p i _i + l,p J „i,p i _i + l,p j _i+2, ...,k,x = fc+l,xi, . . .,Xj-i. 
It then sets Xj to be arbitrary page in {1, ... , f'}\{xi, . . . , and requests Xj. This is a legitimate 

traversal on the extended access graph. 

If Pj-i < /' then: (i) Ifp,_i has not been requested yet in the current phase, then the adversary 
sets Xj = Pj-i, and requests it (ii) Otherwise there must be some i < j such that X{ = Pj-i and the 
adversary generates the requests Xj-i,Xj-2, ■ ■ ■ ,X{, Sj+i, • • • , Xj-i- It then sets Xj to be arbitrary 
page in {1, ... , /'} \ {x\, . . . , Xj-i} and requests Xj. 

When j = f , the phase ends, k different pages have been requested in this phase. A had a 
fault in each sub-phase and therefore its cost has been at least /'. 

The above argument is for one phase, but we can continue this process for an additional c 
phases, for any c, simply by adding an additional c • /' vertices on the right hand side (right of xy 
above). 

Note that (i) A(G) > k — /' for any such graph; (ii) G is a simple path on k + 1 pages, so from 
Theorem[lJ Maxfar has a competitive ratio of 0(log/c). 
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Next, we turn to prove an impossibility result for randomized truly online algorithms. The 
adversary constructs a graph similarly to the previous case, but now it does not know where the 
hole is. Instead it resorts to a random process as follows: It maintains a maximal unrequested 
sub-path (ii,...,«2) of the vertices labeled 1,...,/'. In each sub-phase it first requests all the 
pages already requested during the phase. The adversary then sets i% = (i\ + 12)/%, the midpoint 
vertex in the unrequested segment, and chooses uniformly at random one of the two: 

• Requesting the pages {i 3 , . . . ,i 2 } by setting Xj+i = 12, Xj +2 = %2 - 1, • • • , x j+{i 2 -h)+i = h, 
and then setting j <— j + (12 — 13) + 1. 

• Requesting the pages {h, . . . ,13} by setting x j+1 = 13, x j+2 = 13 - 1, • . . , x j+ ( is _ h ) +1 = h, 
and then setting j <— j + (13 — i\) + 1. 

After approximately log /' sub-phases the phase ends. In each sub-phase, A faults with probability 
at least 1/2, and therefore the expected cost of A in a phase is f2(log/). 

Like the deterministic case, this process can be continued for an additional c phases, for any c, 
simply by adding an additional vertices on the right hand side. 

Note also that the construction maintains A(G) > k — /, and that G is a simple path on k + 1 
pages. Thus Maxfar is applicable in this scenario. 

Assume i > k + 1. To bound Mj, denote by t the first time Vi is reached by the adversary. 
Each sub-phase of the adversary after time t, contributes at most one maximal element to -<j, and 
thus, at most [log /'] maximal elements are added to M, in each phase after time t. However, after 
\(k + 1)//'] phases since time t, all the pages also appear to the right of Vi, since each phase adds 
/' distinct pages to the right of Vi. Therefore \Mi\ < \k/ f'~\ [log/']. Note that a similar argument 
also holds for i < k. Thus, ?"m axfar ( G , k) = 0(log k — log /' + log log /'). □ 

7 Implementation 

Storing Go requires only O(felogn) bits. We also need to keep track of: 

1. The vertices requested so far in the current phase. 

2. The current unevicted vertices of Go- 

3. Go for the next phase. 

One way to construct Go for the next phase is to store a pointer from every page requested to 
the previously requested page. This pointer will be updated only once in a phase (the first time 
the page is accessed during the phase). The resulting data structure is a tree with pointers from 
the leaves upwards, and the root is the first page requested in the phase. 3 At the end of a phase, 
the pointers are scanned and an image of Go is built in memory. This image contains vertices, 
pointers, and the virtual addresses associated with the vertices. In total (hardware registers and 
memory) the memory required is 0{k\ogn) bits. 

3 An alternative approach is to store for each page a pointer to the next page, and update it each time the page 
is accessed. We get a tree rooted with the last requested page in the phase. This tree seems to capture better the 
locality of reference, as it stores edges resulting from more recent requests. However, it also has more pointer update 
operations. 
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There is a rather efficient hardware implementation for processing page hits. The logn bit 
virtual address is translated to a log k page slot address (by the virtual address translation mech- 
anism). The spanning tree is built on the log/c bit address space of page slots but the virtual 
addresses (log n bits) are stored in the vertices. Hence, the hardware storage requirement for im- 
plementing page hits in Dto and Rto is one log k bits pointer and one extra bit (marking bit) 
associated with every page slot. The extra hardware processing associated with interpreting a page 
hit is one log k bits address comparison, one bit comparison, and (possibly) setting one log k bits 
pointer. 

Dto and Rto, as presented here, use a spanning tree of Gp as their reference access graph. It is 
easy to check that instead, we could have use any connected sub-graph of Gp. We can even use Gp 
itself and store it using only 0(k log n) bits (instead of the 0(k 2 log n) in the naive implementation). 
To do that, observe that there is no need to actually store edges whom both their endpoints have 
degree > 3. Thus, when a new edge is revealed we increase the degree of its endpoints 4 , and 
"forget" all the edges in which both endpoints have degree > 3. In this way only 2k edges should 
be explicitly stored. 

It would be of theoretical interest to reduce the memory requirements even further, to be 
independent of n. Our goal is to reduce the factor of logn in the storage requirement to log A;. Of 
course, the address translation table must still use fi(fclogn) bits. But when we restrict ourselves 
to the paging eviction strategy, and assuming we are also told reliably whether a page request is 
a hit or a fault, we can allow small errors, if their total effect on the number of page faults is 
insignificant. 

Consider a universal set of hash functions h : N — > {0, . . . , m — 1}, e.g., h(x) = ax + b (mod m), 
where m is prime and a and b are uniformly and independently sampled from Z m . We replace in 
Rto the usage of the virtual address of a page p with the hash value of the page (h(p)). 

In this case Rto may err and identify two different pages as the same. In the worst case, it 
would "confuse" Rto in the current and the next phase (because Go for the next phase is built 
incorrectly). Nonetheless, since Rto has the marking property, the damage is restricted to these 
two phases, and so such an error would add at most 2k page faults. 

The probability that such a bad event happens is bounded from above by the following simple 
argument: consider the ( k ~^ 9 ) pairs of different pages requested during the last two phases. The 
probability that a pair of different pages collide is 1/m, therefore the probability of an error oc- 
curring during a phase is at most i^^m = ^(m)' Choosing m = 0(/c 3 ) insures that the expected 
number of added page faults due to collisions of hash values is O(k^) = 0(1) per phase. 

Hence, a data structure with O(klogm) = 0(k\ogk) bits gives a randomized algorithm with 
the same performance guarantees as the original Rto algorithm, up to a constant factor. 

8 Concluding Remarks 

In this paper we have studied the access graph model for locality of reference in paging. We have 
shown a somewhat surprising result: It is possible to be both truly online and still very strongly 
competitive in the access graph model. The resulting algorithms seems practical, and can be proven 
to work well even when the locality of reference changes. 
The following issues are not resolved. 

4 The degree can be stored in only three states: "degree=l", "degree=2", and "degree> 3". 
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1 . The proof of Lemma 14.21 is very lengthy. Is there a simpler and / or shorter proof? 

2. In a preliminary version of this work jHJ we claimed that both Dto and Rto can be patched 
so as to be very strongly competitive for any G and k as long as A(G) > k. The patch we 
devised turned out to be erroneous. Still, we conjecture that very strongly competitive truly 
online algorithms are possible when A(G) > k. 

3. Finding an uniform and very strongly competitive algorithm for the extended access graph 
model. 

4. Is there a very strongly competitive algorithm for the directed access graph model? Or is it 
computationally a hard problem? 
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